14 research outputs found
A Cyber Threat Intelligence Sharing Scheme based on Federated Learning for Network Intrusion Detection
The uses of Machine Learning (ML) in detection of network attacks have been
effective when designed and evaluated in a single organisation. However, it has
been very challenging to design an ML-based detection system by utilising
heterogeneous network data samples originating from several sources. This is
mainly due to privacy concerns and the lack of a universal format of datasets.
In this paper, we propose a collaborative federated learning scheme to address
these issues. The proposed framework allows multiple organisations to join
forces in the design, training, and evaluation of a robust ML-based network
intrusion detection system. The threat intelligence scheme utilises two
critical aspects for its application; the availability of network data traffic
in a common format to allow for the extraction of meaningful patterns across
data sources. Secondly, the adoption of a federated learning mechanism to avoid
the necessity of sharing sensitive users' information between organisations. As
a result, each organisation benefits from other organisations cyber threat
intelligence while maintaining the privacy of its data internally. The model is
trained locally and only the updated weights are shared with the remaining
participants in the federated averaging process. The framework has been
designed and evaluated in this paper by using two key datasets in a NetFlow
format known as NF-UNSW-NB15-v2 and NF-BoT-IoT-v2. Two other common scenarios
are considered in the evaluation process; a centralised training method where
the local data samples are shared with other organisations and a localised
training method where no threat intelligence is shared. The results demonstrate
the efficiency and effectiveness of the proposed framework by designing a
universal ML model effectively classifying benign and intrusive traffic
originating from multiple organisations without the need for local data
exchange
Towards a Standard Feature Set of NIDS Datasets
Network Intrusion Detection Systems (NIDSs) datasets are essential tools used
by researchers for the training and evaluation of Machine Learning (ML)-based
NIDS models. There are currently five datasets, known as NF-UNSW-NB15,
NF-BoT-IoT, NF-ToN-IoT, NF-CSE-CIC-IDS2018 and NF-UQ-NIDS, which are made up of
a common feature set. However, their performances in classifying network
traffic, mainly using the multi-classification method, is often unreliable.
Therefore, this paper proposes a standard NetFlow feature set, to be used in
future NIDS datasets due to the tremendous benefits of having a common feature
set. NetFlow has been widely utilised in the networking industry for its
practical scaling properties. The evaluation is done by extracting and labeling
the proposed features from four well-known datasets. The newly generated
datasets are known as NF- UNSW-NB15-v2, NF-BoT-IoT-v2, NF-ToN-IoT-v2,
NF-CSE-CIC-IDS2018-v2 and NF-UQ-NIDS-v2. Their performances have been compared
to their respective original datasets using an Extra Trees classifier, showing
a great improvement in the attack detection accuracy. They have been made
publicly available to use for research purposes.Comment: 13 pages, 4 figures, 13 tables. arXiv admin note: substantial text
overlap with arXiv:2011.0914
From Zero-Shot Machine Learning to Zero-Day Attack Detection
The standard ML methodology assumes that the test samples are derived from a
set of pre-observed classes used in the training phase. Where the model
extracts and learns useful patterns to detect new data samples belonging to the
same data classes. However, in certain applications such as Network Intrusion
Detection Systems, it is challenging to obtain data samples for all attack
classes that the model will most likely observe in production. ML-based NIDSs
face new attack traffic known as zero-day attacks, that are not used in the
training of the learning models due to their non-existence at the time. In this
paper, a zero-shot learning methodology has been proposed to evaluate the ML
model performance in the detection of zero-day attack scenarios. In the
attribute learning stage, the ML models map the network data features to
distinguish semantic attributes from known attack (seen) classes. In the
inference stage, the models are evaluated in the detection of zero-day attack
(unseen) classes by constructing the relationships between known attacks and
zero-day attacks. A new metric is defined as Zero-day Detection Rate, which
measures the effectiveness of the learning model in the inference stage. The
results demonstrate that while the majority of the attack classes do not
represent significant risks to organisations adopting an ML-based NIDS in a
zero-day attack scenario. However, for certain attack groups identified in this
paper, such systems are not effective in applying the learnt attributes of
attack behaviour to detect them as malicious. Further Analysis was conducted
using the Wasserstein Distance technique to measure how different such attacks
are from other attack types used in the training of the ML model. The results
demonstrate that sophisticated attacks with a low zero-day detection rate have
a significantly distinct feature distribution compared to the other attack
classes
XG-BoT: An Explainable Deep Graph Neural Network for Botnet Detection and Forensics
In this paper, we proposed XG-BoT, an explainable deep graph neural network
model for botnet node detection. The proposed model is mainly composed of a
botnet detector and an explainer for automatic forensics. The XG-BoT detector
can effectively detect malicious botnet nodes under large-scale networks.
Specifically, it utilizes a grouped reversible residual connection with a graph
isomorphism network to learn expressive node representations from the botnet
communication graphs. The explainer in XG-BoT can perform automatic network
forensics by highlighting suspicious network flows and related botnet nodes. We
evaluated XG-BoT on real-world, large-scale botnet network graphs. Overall,
XG-BoT is able to outperform the state-of-the-art in terms of evaluation
metrics. In addition, we show that the XG-BoT explainer can generate useful
explanations based on GNNExplainer for automatic network forensics.Comment: 6 pages, 3 figure
Exploring Edge TPU for Network Intrusion Detection in IoT
This paper explores Google's Edge TPU for implementing a practical network
intrusion detection system (NIDS) at the edge of IoT, based on a deep learning
approach. While there are a significant number of related works that explore
machine learning based NIDS for the IoT edge, they generally do not consider
the issue of the required computational and energy resources. The focus of this
paper is the exploration of deep learning-based NIDS at the edge of IoT, and in
particular the computational and energy efficiency. In particular, the paper
studies Google's Edge TPU as a hardware platform, and considers the following
three key metrics: computation (inference) time, energy efficiency and the
traffic classification performance. Various scaled model sizes of two major
deep neural network architectures are used to investigate these three metrics.
The performance of the Edge TPU-based implementation is compared with that of
an energy efficient embedded CPU (ARM Cortex A53). Our experimental evaluation
shows some unexpected results, such as the fact that the CPU significantly
outperforms the Edge TPU for small model sizes.Comment: 22 pages, 11 figure
E-GraphSAGE: A Graph Neural Network based Intrusion Detection System for IoT
This paper presents a new Network Intrusion Detection System (NIDS) based on
Graph Neural Networks (GNNs). GNNs are a relatively new sub-field of deep
neural networks, which can leverage the inherent structure of graph-based data.
Training and evaluation data for NIDSs are typically represented as flow
records, which can naturally be represented in a graph format. This establishes
the potential and motivation for exploring GNNs for network intrusion
detection, which is the focus of this paper. Current studies on machine
learning-based NIDSs only consider the network flows independently rather than
taking their interconnected patterns into consideration. This is the key
limitation in the detection of sophisticated IoT network attacks such as DDoS
and distributed port scan attacks launched by IoT devices. In this paper, we
propose \mbox{E-GraphSAGE}, a GNN approach that overcomes this limitation and
allows capturing both the edge features of a graph as well as the topological
information for network anomaly detection in IoT networks. To the best of our
knowledge, our approach is the first successful, practical, and extensively
evaluated approach of applying Graph Neural Networks on the problem of network
intrusion detection for IoT using flow-based data. Our extensive experimental
evaluation on four recent NIDS benchmark datasets shows that our approach
outperforms the state-of-the-art in terms of key classification metrics, which
demonstrates the potential of GNNs in network intrusion detection, and provides
motivation for further research.Comment: 9 pages, 5 figures, 6 table
Mortality from gastrointestinal congenital anomalies at 264 hospitals in 74 low-income, middle-income, and high-income countries: a multicentre, international, prospective cohort study
Summary
Background Congenital anomalies are the fifth leading cause of mortality in children younger than 5 years globally.
Many gastrointestinal congenital anomalies are fatal without timely access to neonatal surgical care, but few studies
have been done on these conditions in low-income and middle-income countries (LMICs). We compared outcomes of
the seven most common gastrointestinal congenital anomalies in low-income, middle-income, and high-income
countries globally, and identified factors associated with mortality.
Methods We did a multicentre, international prospective cohort study of patients younger than 16 years, presenting to
hospital for the first time with oesophageal atresia, congenital diaphragmatic hernia, intestinal atresia, gastroschisis,
exomphalos, anorectal malformation, and Hirschsprung’s disease. Recruitment was of consecutive patients for a
minimum of 1 month between October, 2018, and April, 2019. We collected data on patient demographics, clinical
status, interventions, and outcomes using the REDCap platform. Patients were followed up for 30 days after primary
intervention, or 30 days after admission if they did not receive an intervention. The primary outcome was all-cause,
in-hospital mortality for all conditions combined and each condition individually, stratified by country income status.
We did a complete case analysis.
Findings We included 3849 patients with 3975 study conditions (560 with oesophageal atresia, 448 with congenital
diaphragmatic hernia, 681 with intestinal atresia, 453 with gastroschisis, 325 with exomphalos, 991 with anorectal
malformation, and 517 with Hirschsprung’s disease) from 264 hospitals (89 in high-income countries, 166 in middleincome
countries, and nine in low-income countries) in 74 countries. Of the 3849 patients, 2231 (58·0%) were male.
Median gestational age at birth was 38 weeks (IQR 36–39) and median bodyweight at presentation was 2·8 kg (2·3–3·3).
Mortality among all patients was 37 (39·8%) of 93 in low-income countries, 583 (20·4%) of 2860 in middle-income
countries, and 50 (5·6%) of 896 in high-income countries (p<0·0001 between all country income groups).
Gastroschisis had the greatest difference in mortality between country income strata (nine [90·0%] of ten in lowincome
countries, 97 [31·9%] of 304 in middle-income countries, and two [1·4%] of 139 in high-income countries;
p≤0·0001 between all country income groups). Factors significantly associated with higher mortality for all patients
combined included country income status (low-income vs high-income countries, risk ratio 2·78 [95% CI 1·88–4·11],
p<0·0001; middle-income vs high-income countries, 2·11 [1·59–2·79], p<0·0001), sepsis at presentation (1·20
[1·04–1·40], p=0·016), higher American Society of Anesthesiologists (ASA) score at primary intervention
(ASA 4–5 vs ASA 1–2, 1·82 [1·40–2·35], p<0·0001; ASA 3 vs ASA 1–2, 1·58, [1·30–1·92], p<0·0001]), surgical safety
checklist not used (1·39 [1·02–1·90], p=0·035), and ventilation or parenteral nutrition unavailable when needed
(ventilation 1·96, [1·41–2·71], p=0·0001; parenteral nutrition 1·35, [1·05–1·74], p=0·018). Administration of
parenteral nutrition (0·61, [0·47–0·79], p=0·0002) and use of a peripherally inserted central catheter (0·65
[0·50–0·86], p=0·0024) or percutaneous central line (0·69 [0·48–1·00], p=0·049) were associated with lower mortality.
Interpretation Unacceptable differences in mortality exist for gastrointestinal congenital anomalies between lowincome,
middle-income, and high-income countries. Improving access to quality neonatal surgical care in LMICs will
be vital to achieve Sustainable Development Goal 3.2 of ending preventable deaths in neonates and children younger
than 5 years by 2030
An Explainable Machine Learning-based Network Intrusion Detection System for Enabling Generalisability in Securing IoT Networks
Machine Learning (ML)-based network intrusion detection systems bring many
benefits for enhancing the security posture of an organisation. Many systems
have been designed and developed in the research community, often achieving a
perfect detection rate when evaluated using certain datasets. However, the high
number of academic research has not translated into practical deployments.
There are a number of causes behind the lack of production usage. This paper
tightens the gap by evaluating the generalisability of a common feature set to
different network environments and attack types. Therefore, two feature sets
(NetFlow and CICFlowMeter) have been evaluated across three datasets, i.e.
CSE-CIC-IDS2018, BoT-IoT, and ToN-IoT. The results showed that the NetFlow
feature set enhances the two ML models' detection accuracy in detecting
intrusions across different datasets. In addition, due to the complexity of the
learning models, the SHAP, an explainable AI methodology, has been adopted to
explain and interpret the classification decisions of two ML models. The
Shapley values of the features have been analysed across multiple datasets to
determine the influence contributed by each feature towards the final ML
prediction.Comment: 11 pages, 7 figure
Inspection-L: A Self-Supervised GNN-Based Money Laundering Detection System for Bitcoin
Criminals have become increasingly experienced in using cryptocurrencies,
such as Bitcoin, for money laundering. The use of cryptocurrencies can hide
criminal identities and transfer hundreds of millions of dollars of dirty funds
through their criminal digital wallets. However, this is considered a paradox
because cryptocurrencies are gold mines for open-source intelligence, allowing
law enforcement agencies to have more power in conducting forensic analyses.
This paper proposed Inspection-L, a graph neural network (GNN) framework based
on self-supervised Deep Graph Infomax (DGI), with supervised learning
algorithms, namely Random Forest (RF) to detect illicit transactions for AML.
To the best of our knowledge, our proposal is the first of applying
self-supervised GNNs to the problem of AML in Bitcoin. The proposed method has
been evaluated on the Elliptic dataset and shows that our approach outperforms
the baseline in terms of key classification metrics, which demonstrates the
potential of self-supervised GNN in cryptocurrency illicit transaction
detection